Testing resample with a different timezone #5

attilapiros · 2023-08-07T18:12:37Z

TEST

attilapiros · 2023-08-08T01:24:47Z

failed with:

Running tests...
----------------------------------------------------------------------
timezone: UTC
  test_dataframe_resample (pyspark.pandas.tests.test_resample.ResampleTests) ... Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

[Stage 0:>                                                          (0 + 2) / 2]

[Stage 0:=============================>                             (1 + 1) / 2]

                                                                                
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/groupby.py:893: FutureWarning: Default value of `numeric_only` will be changed to `False` instead of `True` in 4.0.0.
  warnings.warn(
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/groupby.py:893: FutureWarning: Default value of `numeric_only` will be changed to `False` instead of `True` in 4.0.0.
  warnings.warn(
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/groupby.py:649: FutureWarning: Default value of `numeric_only` will be changed to `False` instead of `True` in 4.0.0.
  warnings.warn(
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
FAIL (26.468s)
  test_missing (pyspark.pandas.tests.test_resample.ResampleTests) ... ok (0.133s)
  test_resample_error (pyspark.pandas.tests.test_resample.ResampleTests) ... ok (2.493s)
  test_resample_on (pyspark.pandas.tests.test_resample.ResampleTests) ... /__w/spark/spark/python/pyspark/pandas/groupby.py:893: FutureWarning: Default value of `numeric_only` will be changed to `False` instead of `True` in 4.0.0.
  warnings.warn(
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
ok (1.922s)
  test_series_resample (pyspark.pandas.tests.test_resample.ResampleTests) ... /__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/groupby.py:893: FutureWarning: Default value of `numeric_only` will be changed to `False` instead of `True` in 4.0.0.
  warnings.warn(
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
FAIL (4.219s)

======================================================================
FAIL [26.468s]: test_dataframe_resample (pyspark.pandas.tests.test_resample.ResampleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 269, in test_dataframe_resample
    self._test_resample(self.pdf4, self.psdf4, ["11H", "21D"], "left", None, "mean")
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 259, in _test_resample
    self.assert_eq(
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 457, in assert_eq
    _assert_pandas_almost_equal(lobj, robj)
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 171, in _assert_pandas_almost_equal
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_PANDAS_DATAFRAME] DataFrames are not almost equal:
Left:
                            A         B
A    float64
B    float64
dtype: object
Right:
                            A         B
A    float64
B    float64
dtype: object

======================================================================
FAIL [4.219s]: test_series_resample (pyspark.pandas.tests.test_resample.ResampleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 276, in test_series_resample
    self._test_resample(self.pdf3.A, self.psdf3.A, ["1001H"], "right", "right", "sum")
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 259, in _test_resample
    self.assert_eq(
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 457, in assert_eq
    _assert_pandas_almost_equal(lobj, robj)
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 228, in _assert_pandas_almost_equal
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_PANDAS_SERIES] Series are not almost equal:
Left:
Freq: 1001H
float64
Right:
float64

----------------------------------------------------------------------
Ran 5 tests in 35.235s

FAILED (failures=2)

Generating XML reports...
Generated XML report: target/test-reports/TEST-pyspark.pandas.tests.test_resample.ResampleTests-20230808005957.xml

attilapiros · 2023-08-08T03:32:49Z

After setting the conf "spark.sql.timestampType" to"TIMESTAMP_NTZ":

======================================================================
FAIL [31.935s]: test_dataframe_resample (pyspark.pandas.tests.test_resample.ResampleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 269, in test_dataframe_resample
    self._test_resample(self.pdf4, self.psdf4, ["11H", "21D"], "left", None, "mean")
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 259, in _test_resample
    self.assert_eq(
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 457, in assert_eq
    _assert_pandas_almost_equal(lobj, robj)
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 171, in _assert_pandas_almost_equal
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_PANDAS_DATAFRAME] DataFrames are not almost equal:
Left:
                            A         B
A    float64
B    float64
dtype: object
Right:
                            A         B
A    float64
B    float64
dtype: object

======================================================================
FAIL [4.803s]: test_series_resample (pyspark.pandas.tests.test_resample.ResampleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 276, in test_series_resample
    self._test_resample(self.pdf3.A, self.psdf3.A, ["1001H"], "right", "right", "sum")
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 259, in _test_resample
    self.assert_eq(
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 457, in assert_eq
    _assert_pandas_almost_equal(lobj, robj)
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 228, in _assert_pandas_almost_equal
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_PANDAS_SERIES] Series are not almost equal:
Left:
Freq: 1001H
float64
Right:
float64

----------------------------------------------------------------------
Ran 5 tests in 42.031s

FAILED (failures=2)

github-actions bot added PYTHON PANDAS API ON SPARK labels Aug 7, 2023

attilapiros force-pushed the test-resample-with-tz branch 2 times, most recently from 8b231e7 to 630bbce Compare August 8, 2023 00:05

attilapiros force-pushed the test-resample-with-tz branch from d53df81 to e36d4c6 Compare August 16, 2023 21:21

github-actions bot added CORE INFRA BUILD DOCS SQL AVRO ML STRUCTURED STREAMING R YARN KUBERNETES WEB UI CONNECT labels Aug 16, 2023

attilapiros force-pushed the test-resample-with-tz branch from e36d4c6 to e0abf77 Compare August 16, 2023 22:42

TEST

4fdb627

attilapiros force-pushed the test-resample-with-tz branch from e0abf77 to 4fdb627 Compare August 16, 2023 22:44

attilapiros removed CORE PYTHON INFRA BUILD DOCS SQL AVRO labels Aug 16, 2023

attilapiros removed ML STRUCTURED STREAMING R YARN KUBERNETES WEB UI CONNECT labels Aug 16, 2023

github-actions bot added the PYTHON label Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing resample with a different timezone #5

Testing resample with a different timezone #5

attilapiros commented Aug 7, 2023

attilapiros commented Aug 8, 2023 •

edited

Loading

attilapiros commented Aug 8, 2023

Testing resample with a different timezone #5

Are you sure you want to change the base?

Testing resample with a different timezone #5

Conversation

attilapiros commented Aug 7, 2023

attilapiros commented Aug 8, 2023 • edited Loading

attilapiros commented Aug 8, 2023

attilapiros commented Aug 8, 2023 •

edited

Loading